CLAIMS: 

1 1. A method of predicting the occurrence of critical events in a 

2 computer cluster having a series of nodes, said method comprising: 

3 maintaining an event log that contains information concerning critical 

4 events that have occurred in the computer cluster; 

5 maintaining a system parameter log that contains information 

6 concerning system parameters for each node in the cluster; and 

7 predicting a future performance of a node in the cluster based upon 

8 said event log and said system parameter log. 

1 2. The method of claim 1 comprising developing a Bayesian network 

2 model that represents said computer cluster and said nodes based upon the 

3 information in said event log and said system parameter log. 

1 3. The method of claim 1 wherein maintaining said system parameter 

2 log comprises recording a temperature of a node in the cluster and a 

3 corresponding time value. 

1 4. The method of claim 1 wherein maintaining said system parameter 

2 log comprises recording a utilization parameter of a central processing unit of 

3 a node in the cluster and a corresponding time value. 

1 5. The method of claim 1 comprising filtering said event log and said 

2 system parameter log such that some critical event information and some 

3 system parameter information is not maintained in said event log and said 

4 system parameter log. 
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1 6. The method of claim 1 comprising using a time-series mathematical 

2 model to predict future values of said system parameters. 

1 7. The method of claim 1 comprising using a rule based classification 

2 system to predict future critical events based upon said critical event 

3 information and said system parameter information. 

1 8. The method of claim 1 wherein the step of predicting comprises 

2 forming a warning window for each node in the cluster such that said warning 

3 window contains a predicted performance parameter or critical event 

4 occurrence for the node for a predetermined future period of time. 
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1 9. A method of improving the performance of a computer cluster having 

2 a series of nodes comprising: 

3 monitoring the occurrence of critical events in said nodes in said 

4 computer cluster; 

5 monitoring system performance parameters of said nodes in said 

6 computer cluster; 

7 creating a node representation for each node in said computer cluster 

8 based upon said monitoring; 

9 creating a cluster representation based on said node representations; 

10 periodically examining said node representations to predict future node 

1 1 performance; and 

12 using said cluster representation to redistribute tasks among said nodes 

13 based upon said predicted node performance. 

1 10. The method of claim 9 wherein creating said cluster representation 

2 and said node representation comprises creating a Bayesian Network that 

3 represents relationships between the occurrence of said critical events and 

4 said system performance parameters. 

1 11. The method of claim 9 comprising saving information concerning 

2 said critical events and said system performance parameters in a database. 

1 12. The method of claim 1 1 comprising filtering said saved information 

2 to remove information wherein said removed information is not determined to 

3 be useful in predicting a future performance of said nodes. 
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1 13. The method of claim 9 comprising applying a time-series 

2 mathematical model to said system performance parameters to predict future 

3 values of said system performance parameters. 

1 14. The method of claim 13 wherein said time series mathematical 

2 model is one of an auto regression, a moving average and an autoregressive 

3 moving average model. 

1 15. The method of claim 9 comprising using rule based classifications 

2 to associate some system performance parameters with occurrence of said 

3 critical events. 

1 16. The method of claim 9 wherein said system performance 

2 parameters concern at least one of a node temperature, processor utilization 

3 value, network bandwidth and available memory space. 
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1 17. An information processing system comprising: 

2 a computer cluster having a series of nodes; 

3 a control system for monitoring critical events that occur in said 

4 computer cluster and system parameters of said nodes; 

5 a memory for storing information related to said occurrence of said 

6 critical events and said system parameters of said nodes; and 

7 a Bayesian Network model for predicting a future occurrence of a 

8 critical event based upon an observed relationship between said system 

9 parameters and said occurrence of critical events. 

1 18. The information processing system of claim 17 comprising a filter 

2 for removing redundant information from said stored information. 

1 19. The information processing system of claim 17 wherein said 

2 Bayesian Network comprises a time-series modeler for predicting future 

3 values of said system parameters. 

1 20. The information processing system of claim 17 wherein said 

2 Bayesian Network comprises a rule based classification system for associating 

3 said system parameters with said occurrences of said critical events. 

1 21 . The information processing system of claim 17 comprising a 

2 dynamic probe generator for determining when to collect additional information 

3 concerning said system parameters or said critical event occurrence. 
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